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Abstract 

We analyze the component evolution in inhomogeneous random 
intersection graphs when the average degree is close to 1. As the aver- 
age degree increases, the size of the largest component in the random 
intersection graph goes through a phase transition. We give bounds on 
the size of the largest components before and after this transition. We 
also prove that the largest component after the transition is unique. 
These results are similar to the phase transition in Erdos-Renyi ran- 
dom graphs; one notable difference is that the jump in the size of the 
largest component varies in size depending on the parameters of the 
random intersection graph. 
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1 Introduction 



The well-studied Erdos-Renyi graph, G(n,p), is a basic model for random 
networks that is amenable to structural analysis. However, G(n,p) is not 
suited as a model for real-world networks; perhaps the most common criti- 
cism is that sparse realizations of G(n,p) do not exhibit clustering [TT]. Thus 
G(n,p) is not a good model for most social networks which are usually sparse 
and have nontrivial clustering. In many cases this phenomenon (sparsity to- 
gether with clustering) is a result of the graph originating as the intersection 
graph of a larger bipartite graph. For example, the well-known collaboration 
graphs of scientists (or of movie actors) is derived from the bipartite graph 
of scientists and papers (respectively, actors and movies) [2U HE] ■ 

A simple natural model for such networks is the random intersection 
graph. Random intersection graphs were introduced by Karohski, Schein- 
erman and Singer-Cohen [23l [15] and have recently attracted much atten- 
tion [U El Q3J EU 122] • We study the phase transition for components in 
the inhomogeneous random intersection graph model defined by Nikoletseas, 
Raptopoulos and Spirakis [HI [20] • Let p = be a sequence of m prob- 

abilities, V a set of n vertices and A = {a\, a%, ... , a m } a set of m attributes. 
For (v, ai) G V x A, define independent indicator random variables T Vj(H = 
Bernoulli (pi). A random bipartite graph B is defined on the vertices V and 
attributes to contain exactly those edges, (v, at) for which X v>a . = 1. Finally, 
the random intersection graph G is obtained from the bipartite graph B by 
projecting onto the vertices V: two vertices are connected in G if they share 
at least one common attribute in B. 

This paper is concerned with asymptotic results; for each n let p( n ) be a 
vector of m = m(n) probabilities. This defines a sequence of random inter- 
section graphs indexed by n. We say an event E n holds with high probability 
if P[-En] - > 1 as n — > oo. We show that depending on the sequences one 
may observe larger or smaller jumps in the phase transition. 

In previous work [HQS], the phase transition was located for random inter- 
section graphs defined with uniform probabilities. The component evolution 
of inhomogeneous random intersection graphs has been studied for a different 
model of random intersection graphs [31 [2] . The results in these papers used 
tools developed by Bollobas, Janson and Riordanp] and are exact, though 
they only consider those cases when the giant component is linear. In [6] 
another general model of sparse random graphs is introduced and analyzed. 
Behrisch [JJ studied the uniform homogeneous case when all pi = p = cj y/nm 
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and noted that if p = u(l/n), then the largest component jumps from size 
0(np\ogn) to 0(p _1 ); a smaller jump than observed in Erdos Renyi random 
graphs. On the other hand for p = 0(l/n) the largest component jumps 
from size O(logn) to Q(n); a jump similar to that in Erdos- Renyi random 
graphs. Indeed for m large enough and pi = cj y/mn, the random intersection 
model is equivalent to G(n,p)^\ |2T]. Our theorems show these phenomena 
occurring in the more general setting of inhomogeneous random intersection 
graphs as well. 

Theorem 1.1. Ifn^pf < 1 then with high probability all components in G 
will have size at most 0(max{nplogn, logn}) where p = max pi. 

Note that each attribute czj contributes a clique of expected size npi to the 
random intersection graph; thus Theorem 11.11 is very close to best possible. 

Theorem 1.2. If n^pj — c > 1 an d there exists 07 > 1/2 such that 
maxp(™) = o(n~' y ), then with high probability there exists a unique largest 
component. This component will have size (1 — p)n where p is the unique 
solution in [0, 1) to the equation 

rn 
i=l 

All other components will have size of order 0(m&x{nplogn, logrz}). 

Importantly, the unique largest component guaranteed by Theorem 11.21 
is not necessarily linear in n. Under the conditions of the Theorems 11.11 and 
11.21 there is, however, a jump in the size of the largest component when tran- 
sitioning from the subcritical phase (when n^pf < 1) to the supercritical 
phase (when n^p\ > 1). Thus a phase transition is observed. The phase 
transition is made apparent in comparing Theorems 11.11 and 11.31 though the 
later is not necessarily best possible. 

As our model is quite general, our theorems do not always give the best 
possible bounds. What is perhaps surprising is that despite the generality 
of the model, we can locate the phase transition exactly. No assumptions 
of uniformity nor convergence of the sequences are necessary; we only 
require that n^pf be a constant. Because of this generality, there are many 
cases where the solutions to Equation ( II. ip do not converge as n — > 00. In 
such cases, even the order of magnitude of the unique largest component may 
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fluctuate. To compensate for this, we state a weaker version of Theorem 11.21 
which gives a lower bound on the order of magnitude of the unique largest 
component. We also show, in Proposition I3.3| how to use Theorem 11.21 to 
derive the exact size of the largest component in the uniform (homogeneous) 
case. In this way we recover exactly previously proved results using our more 
general method [Tj ITS]. 

Theorem 1.3. If n^pf = c > 1 and there exists a 7 > 1/2 such that 
p = max pi = o(n~ 7 ), then with high probability there exists a unique giant 
component. If p~ x = o(n), this component will be of size at least fiQo -1 ). 
Otherwise it will be of size Q(n). All other components will have size at most 
0(max{np\ogn, logn}). 

Example. Let m = n a , a < 1, and set Pi = c/ y/mn for some constant c. If 
c < 1 then by Theorem II .1| each component has size at most 0(\/n/m logn). 
On the other hand, if c > 1, there exists a unique largest component whose 
size is Q(np) by Theorem 11.21 See Proposition 13.31 for the derivation of the 
exact bound. These bounds are the same as obtained by Behrisch [Tj. 



Example. Let m = (3n, and set Pi = c/ \Jmn for some constant c. If c < 1, 

then each component has size at most O(logn). This is the same bound 
as obtained in [16J. On the other hand, if c > 1 Theorem 11.21 implies the 
existence of a unique largest linear component; in Proposition 13.31 the exact 
size is derived. Here our bounds are the same as previously derived [16J. 

As is standard in the analysis of the phase transition of random graphs 
we will use both concentration results and the theory of branching processes, 
specifically Galton- Watson processes. In the next two sections, we collect 
the results we will use from these two areas. We do not provide proofs for 
results which are either well known or easily derived from well know results. 



2 Concentration of Measure 

Recall the Chernoff inequality onI~ Bin(n,p) with t > (for a proof see 
[13] Theorem 2.1) 




P [X > EX + t] < exp 



( 



2(np + t/3) 



) 



(2.1) 
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Given a subset of the attributes {a^, a i2 , . . . , a ik }, it will be useful to approx- 
imate the number of vertices likely to be connected to at least one of the 
given attributes. To find an upper bound on the number of vertices, we first 
estimate 

W= J2 P* 

ie{h,h,—,ik} 

and then use Equation (12. ip . To find good approximations of W as above, 
we need the following useful generalization of Equation (12.11) due to McDi- 
armid [IT] and further generalized by Chung and Lu [TJ. 

Theorem 2.1 Q7J). Suppose Y{ are independent random variables with Mi < 
Yi < M 2 , forl<i<n. Let Y = £? =1 Y t and \\Y\\ = ^YJU^Y?) . Then 

F[r>Er + A]<exp(- 2(||r||2 A + 2 M2A/3) ), (2.2) 
P[y<Ey-A]<ex P (- 2(||y||2 A _ 2 MiA/3) ). (2.3) 



3 Branching Processes 

We shall make use of the theory of Galton- Watson branching processes. In 
a single-type Galton- Watson branching process, each individual has descen- 
dants given by a common distribution, Z. Standard results (see [12], Chapter 
1) show that if the mean of Z is less than 1, the process dies out eventually 
while if the mean is greater than 1, there is a positive probability given by 
1 — p, that the process survives indefinitely. In this case, p is the unique 
solution in [0, 1) to the equation 

oo 

x = ^T f F[Z = i]x\ (3.1) 

i=0 

For a random intersection graph G with parameters n and p = {pi)™ =1 , we 
will associate the Galton- Watson process where descendants are taken from 
the probability distribution of the degree of a random vertex, v, in G (i.e. 
F[Z — k] — ¥[d(v) = k] for each k.) Lemma [3.11 elucidates the relationship 
between n, p and the associated Galton- Watson process. 
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Lemma 3.1. Let n, p = (pi,p 2 , • • • ,P mi ) and q = (q u q 2 , . . . , q m2 ) be given 
such that there exists a S C [m 2 ] and a one-to-one map n : [m2]\S — )■ [mi] 
ty^A i/ie properties that 



(i) Vj G [m 2 ]\S, p w(j -) = q jt 
(ii) Vj G S*, Vi G [m 1 ]\vr([m 2 ]\S), p< > ^ 



i=i j=i 



//A' and y are the Galton-Watson processes associated with p, q respectively, 
then the probability that X dies out is at least as large as the probability that 
y dies out. 

Proof. Let X and Y be the degree distributions for the degrees of an ar- 
bitrary vertex in the random intersection graphs with parameters n, p and 
n, q respectively. Then X and y are the Galton-Watson processes where 
each generation is chosen independently from X, respectively Y. Writing 
Xi = F[X = i] and yi = F\Y = i] it is easy to see that 



which is equivalent to condition (iii). Moreover, conditions (i)-(iii) imply 
that xq > yo and that there exists an I > with 



It is now easy to show that f(z) = Yl°iLo x i z% dominates g(z) = Xljlo^i' 2 ' 7 
on the interval [0,1]. Indeed writing h(z) = f(z) — g(z) we have h(0) = 
xo — Vo > while h(l) = 0. Then Equation ( 13. 2 j) and Statement ( 13. 3 p imply 
that for < z < 1, h'(z) < h'(l) = 0. That is, h is decreasing on [0, 1], hence 
h(z) > for z G [0, 1]. In particular if the expected values f'(l) = </(l) are 
greater than 1, then there is a non-zero probability 1 — p that the process 
y survives. In this case, p satisfies the equation p = g(p). Then f(p) > p 
which implies that the solution to the equation z = f(z) is at least p. Then 
the probability that X dies is at least as large as the probability that 3^ dies 
out. □ 



n 



n 




(3.2) 



Vz, < i < I, Xi < yi and Vi, i > I, Xi > yi. 



(3.3) 
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3.1 Multi-type Galton Watson Processes 

It will be convenient to consider multi-type Galton- Watson processes as well. 
For a random intersection graph with parameters n and p = (pi)™ =1 , we 
associate the following m + 1 type Galton- Watson process. Individuals of 
type relate to the vertices in the associated random bipartite graph B, 
while all other individuals in this process relate to attributes of B. Moreover, 
individuals of type can have offspring of each of the types 1,2, ... ,m; 
the amount is taken from Bernoulli respectively. Individuals of types 
% = 1,2, ... ,m only have offspring of type 0, where the amount is taken from 
the distribution Bm(n,pi). The process starts with one individual of type 
0. We review standard results [12] which imply that if n^iPf — c > 1 
then this multi-type process survives with positive probability 1-p where p 
is given by the unique solution in (0, 1) to Equation (jl.ip . Note that for a 
given parameter set, the associated single-type Galton- Watson process and 
multi-type Galton- Watson process have the same probabilities of survival 
and extinction. 

Consider a general multi-type Galton- Watson process with m + 1 types 
labeled 0,1,..., m. For each positive integer N and each type i, define 
flf(x , x\, . . . , x m ) to be the generating functions for the descendants at time 
N given that the process started with exactly one individual of type i. That 
is, let p%(ro, r%, . . . , r m ) represent the probability that the process starting 
with one individual of type i will in the N th generation have Tq,T\, . . . ,r m 
offspring of types 0,1, ... ,m respectively. Then the generating functions can 
be expressed as 

oo oo oo 

fw{ x 0- X\, . . . , X m ) = ^ ] ^ ] ' " " ^ ] £>7v( r (b r li ■ ■ ■ i r rn)XiX2 ' 4 ' X™. 
ro=0 ri=0 ro=m 

Writing x = (x , X\, . . . , x m ) and fjv(x) = (/&(x), /jv(x), • • • , //^( x )) ; w e have 

= (3-4) 

From the definition, /jv(0) is exactly the probability of extinction by the 
iV th generation if the process starts with one individual of type i. Thus 
fh+iW — /at(0) an d in particular, lim/^(0) exists and is less than or equal 
to 1. Writing q, t = lim/^(0) we see that q = (q , qi, . . . , q m ) is a solution to 
the equation 

fi(q) = q. (3.5) 
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Let rriij be the expected number of offspring of type j from an individual 
of type i and let M = (my) be the matrix of these first moments. Suppose 
s is a vector with |1 — s*| < 1 for each i. Then from Taylor's theorem with 
remainder we have 



Assume that for any such vector s, there exists an N Q with iM^s] > 2|s|. 
Then it follows that there exists a nonnegative solution different from 1 to 
Equation (13. 5p . Indeed fix e > 0. If q = 1, then there exists a sufficiently 
large N such that |1 — fjv(0)| < e. Using s = 1 — fjv(O) we conclude that 



a contradiction to the fact that q — fjv(0) — > monotonically as iV — > oo. 

If, in addition to the above assumption, we also assume that < 1 for all 
i, it follows that if qi is any vector in the unit cube not equal to 1, we have 
fjv(qi) — > q as N — > oo. (For a proof, see II. 7.2 in [12].) On the other hand, 
if there exist two types, i and j with qi — 1 and qj ^ 1, then Equations (13.41) 
and (13. 5p imply that for all all N, 1 — /jy-(q). In particular /^(x) is thus 
independent of Xj which implies that (M^jy = for all N. 

Corollary 3.2. Ifn and p are given with n^pf = c > 1 then the associated 
multi-type Galton- Watson process, as defined above, survives with probability 
1 — p where p is the unique solution in [0, 1) to Equation (II. lp . 

Proof. Without loss of generality, we suppose the pi are all nonzero. Note 
that the probability generating functions associated with the multi-type 
Galton- Watson process are 



Thus the solution to Equation (I3.5P is exactly given by Equation (11.11) . 

To show that this gives the extinction probability of the branching pro- 
cess, it remains to verify the following two assumptions. First, that for any 
vector s sufficiently close to 1, there exists an N with iM^s] > 2|s|. Secondly, 
that for each pair i,j there exists an iV such that (M^)^- ^ 0. 

Note that for i, j > 0, rriij = 0, while for i > 0, = Pi, m i0 = npi and 
for all i, ma = 0. As n^pf = c > 1 then we have (M 2Ar ) o = c N which 



fjv(l-s) = l-NTs + o(|s|) 



s| 



i-W„(o)| = |i 



-f No (f N (0))\ > |l-fjv(0)|, 




(3.6) 
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clearly implies the first statement. Secondly, it is not hard to check that 
(M 2/c+1 )jj 7^ when exactly one of i,j are equal to 0. On the other hand, 
(M 2fc )jj ^ when i,j>0 and when i — j — 0. Thus the second assumption 
above also holds and we can conclude that the unique solution in (0, 1) to 
Equation (II. ip is indeed the probability of extinction when the process starts 
with one individual of type 0. □ 

Finally, we show here how to use Equation (II. ip to derive the size of the 
largest component in the supercritical phase for the uniform homogeneous 
case. 

Proposition 3.3. Let m = m(n) be a sequence of integers indexed by n and 
let c > 1 be given. Define p = ^Jc/mn. Then the associated m + 1 type 
Galton- Watson process eventually dies out with probability given by 



where ( and (* are the unique solutions in (0, 1) to the equations exp (c(x — 1)) = 
x and exp (mpexp (np(x — 1)) — 1) = x, respectively. 

Proof. In each case we will use the fact that 1 — p = exp(—p — o(p)). 
When mp = o(l), letting p = 1 — (1 — ()mp, we have 



[1 _ p (i _ (i _ p (i _ C ))»)] m = [i _ p (i _ e -«p(i-0(i+o(D))] m 

= [l-V(l-C)(l + o(l))P 

= exp(-c(l-C)(l + o(l)))-K- (3-9) 




1 — (1 — ()mp if m = o(n) 
C if n = o(m) 

if m = Q(n), 



(3.7) 



[i-p(i-(i- P (i- P )r)] 



7)1 



h - p h _ e -«p(i-p)(i+o(i)))] m 
[l_p(l_ e -c(i-0(i+o(i)))]"* 
[l-p(l-C(l-o(l)))] m (3.8) 
exp [-mp(l-C(l-o(l)))(l + o(l))] 
l-(l-C(l-o(l)))mp(l + o(l))^p. 



Secondly, if np — o(l) then we have 
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Finally, if n = 6(m), then np and mp are constants. We have 



in 



n[i-p(i-(i-p(i-or)] 



< [l_p(l_ e -np(l-C')(l+o(l)))] 



= exp {mp ( e «p(C-i)(i+o(i)) - l) (1 - (1))) . 



(3.10) 



□ 



Note if n is replaced with n(l — o(l)) and m with m(l — o(l)), the asymp- 
totic results are the same. 

4 Proofs of Main Theorems 

4.1 Discovery Process 

For a random intersection graph G, we define the following discovery process. 
Let B be the bipartite graph associated to G and let v\ be a vertex (as 
opposed to an attribute) of B. For i = 0,1,2,... inductively define sets of 
unsaturated vertices, discovered vertices and discovered attributes, denoted 
by Ui, Vi, Ai, respectively. Initially set A = 0, and Vo = Uq = {v i}. At step 
i, if Ui-\ is empty the process terminates. Otherwise pick V; L G Ui-\. Let 
A\ denote the set of attributes connected to Vi in A\Ai_i. Thus A\ is the 
set of newly discovered attributes. Discover next the vertices, V- of V\Vi^i 
connected to at least one attribute in A' { . Let Xi denote the cardinality of 
V-. Note again that Xi is the number of newly discovered vertices. Define 
the sets 



A vertex or an attribute can only be discovered once. Crucially, the event 
that the vertex Vi is connected to an attribute a G A\A^_i is independent of 
the history of the discovery process. Similarly, the event that an attribute 
a G A\ is connected to a vertex v G V"\Vi_i is independent of the history of 
the discovery process. 



Ai 
Vi 
Ui 



A-! U A\ 
Vi-, U V! 
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4.2 Subcritical phase 



Proof of Theorem [EH Let p = (pi)^=i be given such that n^pf < 1. Let G 
be a random intersection graph obtained from p and set p = max p^. Consider 
the discovery process of G: if A\ is known but Vi has not yet been discovered, 
define the random variables Wi = YujeA'. Pj anc ^ -^-t ~ Bin(n, Wi). Wi can be 
thought of as the weight of the attributes associated to the vertex t^. Note 
that Xf stochastically dominate Xi, as 

The proof now follows from the following three claims. 
Claim 4.1. ^*L 1 -i 5 Q stochastically dominated by Xt,\ ~ Bin(n, Y2i=i Wi). 
Claim 4.2. Lei jfe > j^nphgn. Then nP[E*=i W< > = 

Claim 4.3. Let k > (151ogn)/(l - cf and X ( + ~ 5m(n, J2i=i W i) ■ ! f 
ELi Wi < (k - 1 + kc)/2n, then nF{X+ k) > k — 1) — o(l). 

Before we prove the claims, we show that they imply the theorem. First, 
note that the probability that the component in G containing v\ has size 
at least k is bounded by PE i=1 Xj > k — 1]. Claims 14.11 and 14.31 imply 
that if Ei=i^i i s sma U enough then all components have size O(logn). 
However, to prove Ei=i * s indeed small enough in Claim 14.21 we need 
k = Q(np\ogn). As k is the upper bound on the component sizes, we 
conclude that all components in G have size at most O (max {rip log logn}) 
as desired. □ 

We now prove Claims 14.11 14.21 14.31 

Proof o/ l4.1[ It is clear that for each i, Xi is stochastically dominated by Xf . 
Similarly Ei=i -^t ls stochastically dominated by X^ ~ Bin(n, Et=i Wi). 

Proof o/ l4.2[ Recall that Wi is the weight of the attributes associated to in 
the discovery process. As attributes can only be discovered once during the 
process, W t < YJjLi Pfl-^a, ■ In particular Y^=\Wi < E?=i X^liPj 2 ^ - 



11 



The last sum consists of km summands each of which is no greater than p. 
Applying Theorem 12.11 with M 2 = p it follows that 



k m 



EE**.- 

i=i j=i 



kE 



.3=1 



k p^j < pk p* = cpkj 
i=i 3=1 



n . 



Applying Theorem 12. II with A = (1 — c)k/(2n), we obtain 

k 



nP 



E«-< 



> 



8=1 



1 + ck 
~2~n 



< nP 



k m 

EE^« 

.i=i i=i 



> 



< nexp 
= nexp 



1 + ck 
'1 - c) 2 *; 2 



(2n) 2 2 (cpk/n + p(l - c)fc/(6n)) 
(l-c) 2 A; 



2np(4c+ 1) 



o(l) 



where the last equality follows for k > ^^ nplogn. 



□ 



Proo/o/|01 As X+ } ~ Bin(n,ELiWi) and Eti ^ < (* + ^)/2n, it 
follows that is stochastically dominated by XXt ~ Bin(n, {k + kc)/2n). 
By Chernoff's inequality, 



P[A'++ > * - 1] < P 



■ > (1 + c)* (1-^ _ 

(fc) - 2 2 



< 



exp 



(¥*-i) s 



< exp 



(1 + c)fc + 2 (if£fe - 1) /3 



0(1) 



5(2 + c) 

where the last equality follows by letting k > (151ogn)/(l — c) 2 . 



□ 



4.3 Supercritical phase 



Proof of Theorem \1.2[ Let 7 6 (1/2,2/3) and p = (pi)™i be given such 
that p = max pi = o(n~ 7 ) and Y2p1 = c / n with constant c > 1. Let G 
be the random intersection graph obtained. Consider the same discovery 
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process as denned in Section 14.11 on the associated random bipartite graph 
B. In particular, recall that A[ is the set of newly discovered attributes at 

logn} and 



step i and that Wi = ^2j £ A'.Pj- Let fc_ = max{ l°g 



n. 



125c 



n ' 



The following is an adaptation of standard results for the phase 



transition in Erdos-Renyi random graphs (Theorem 5.4, [13J). 
First note that for each k G k + ], the following holds 



E 



i=l 



> -(l-o(l)). 
n 



(4.1) 



To see this, note that the probability that attribute j is discovered by the 
kth step is 1 — (1 — Pj) k - Thus 



E 



i=l 



i=i 



E(^ 2 -(! 



v • + • • • + 



I -Pi) 



fc+1 



> 



i 

fc-(l 
n 



k-1 



o(l)). 



The last equality follows from pj = o(n 7 ) which implies pk = o(V 
We now use Equation (14. ip to show that for each k G fc+], 



P 



Ewi 



; < 



ck 



i=i 



n 



(c-l)fc 
3n 



on 



-5^ 



(4.2) 



This follows from ( 12. 3ft by writing Y^h=i ^ = Y^jLiPjlji w ith Jj the indicator 
random variable equal to 1 with probability 1 — (1 — Pj) k and otherwise. 
Clearly, pjlj > for each j and ||£ti^ll 2 = ^ Ef^I,) 2 ] = E,p|[1 - 
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[l-Pjf]. Thus 



P 



l)fc 



3n 




2£iJ?[l-(l-Pi) fc ] 
(c-l) 2 A; 2 



18n 2 fcp^ 



18cpn 



o(n- 5/3 ). 



Indeed, it follows from (12. 2p and a similar derivation that for k G 

k 



P 



£Wi>(2c-l)- 



-5/3> 



o{n- b ' A ). 



(4.3) 



We now show that with high probability there are no components with 
k G [k-, k + ] vertices. In particular, we show that either the discovery process 
terminates after steps, or that for each k G [k- , k + ] , there are at least 
(c — l)k/2 unsaturated vertices. As Yli=i Xi = \Uk\ + k — l, it will be enough 
to show that for each k G [&_, k+], with high probability, ^2 i=1 Xi is at least 
(c + l)k + /2. From Equation (14.21) . with high probability the weight of the 
discovered attributes after k steps will be at least {2ck + k)/3n. 

As we only need to find (c+ l)k + /2 vertices, we can bound each Aj from 
below by XT ~ Bin(n — (c + l)k + /2, W*). We further bound from below 
Yli=i^i by X7 k) where X7 k) ~ Bin(n — (c+ l)/2,p^), where p k is defined 



as 



Pk 



k / fe 



i=l 



,i=l 



Equation (14. 3 p implies that with high probability p k = Y^i=i — 
In turn, Equation (14. 2p then implies that 

(2c + l)k, 

The probability there is a component of size between k_ and k + is thus 
bounded above in the following manner: 
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n 



k=k- 



i=l 



k+ 

<n^P 

k=k- 
k+ 

< n exp 

k=k- 

< nk + exp 



X7 k) <k-l+ {C - 1)k 
(k) - 2 



(c~l) 2 k 2 



25(2c+ l)k 
'c-l) 2 k_ 



25(2c+ 1) 



o(l). 



We now show that if there is a component of size at least k+, then it 
is unique with high probability. Suppose that there are two vertices v' and 
v" which belong to components of size at least k + . Consider our discovery 
process starting at v' . At the end of the k + step, there are at least (c— l)k + /2 
unsaturated vertices which belong to the component containing v'. Similarly, 
if we consider the discovery process starting with v", again there are at least 
(c— l)k + /2 unsaturated vertices in the component containing v" . Denote by 
A' and A" the sets of attributes discovered by the k + step for each of the two 
discovery processes, respectively. If the two components are distinct, then 
in particular none of the unsaturated vertices in V are connected to any of 
the unsaturated vertices in V". With high probability this will not occur. 
Indeed, 



F[V ^ V"} < ] [ (1 - p. 

i£(A'l>A") 

{c-l)k 



(c-l)fc+ 



< 



" J2ig(A'uA") Pi 



k+(c-l) 



< exp 



n 



exp ( — (c — l)n J 2 j = o 



The last inequality follows from the fact that 2^2ma'uA") p! — ~t i m P nes 
Note that Equation (| 4.3 [ ) implies that n 2^ig(A'uA") Pi 

c-o(l)>l. 

We have yet to show that G contains a component of size at least k + . 
Denote by p = p(n, p) the probability that a given vertex of the random 
intersection graph will be in a small (i.e. of size at most k_) component. 
Now p is bounded from below by the extinction probability p_ = p_ (n, p) of 
the associated multi-type branching process. 

To bound p from above, recall Equation ( 14. 3 p which implies that after k_ 
steps of the discovery process, with high probability < (2c — l)k_/n. 
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Thus we bound p by p + = p+{n — fc_,p') + o(l) where p' = {pi\i G ^4\^4'} 
for a suitable set of attributes A 1 such that Y^ieA'P* — (2c— l)k-/n. Lemma 
13.11 implies that p + is largest when A 1 consists of the smallest (by weight) 
elements of A. Thus assuming without loss of generality that the sequence p 
is monotone increasing, let I be maximal such that Y^\=iPi < ( 2c — l)A;_/n. 
Then if A' consists of the first I attributes, and p' = (p«)™ fc , p+ will be largest 
given that J2ieA'Pi - ( 2c ~ 

The definition of I implies that n Yl\=\ Pi = an d thus Yl\=\ max {Pi; n Pi } = 
o(l). It follows from Equation (11. ip that in the limit as n — >■ 00 p + = 
jO_(l + o(l)). Thus y, the expected number of vertices in small components 
of G is p(l + o(l))n. To see that Y is strongly concentrated about its mean, 
note that 

E[Y 2 } < np(n, p)k- + np(n, p)np(n - jfe_, p) = (1 + o(l))E[F] . 
Thus by Chebyshev's inequality the variance of Y is o(l) as desired. □ 

Proof of Theorem 11.31 The proof is exactly the same as above except that 
we give a weaker upper bound for p. Let Gi be the random intersection 
graph with parameters n — A;_,p' = (pi)i>k as above. Let Y be the degree 
distribution of G\ and y the associated single-type Galton- Watson branching 
process. Let G 2 be the random intersection graph on n — k_ vertices and 
m = attributes, each assigned the probability p. Let X be the degree 

distribution of G2 and X the corresponding Galton- Watson process. The 
probability generating function for X and 3^ will satisfy the conditions of 
Lemma [3TT1 and thus the extinction probability for X gives an upper bound on 
the extinction probability for y and thus an upper bound on the probability a 
vertex in G\ is in a small component. Applying Proposition 13 . 31 it follows that 
the expected size of the largest component in G is at least Q(mm{p~ 1 ,n}). 

□ 
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